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g i Abstract 

c/3 . We consider signal transaction in a simple neuronal model featuring intrinsic noise. 

^V The presence of noise limits the precision of neural responses and impacts the quality 

of neural signal transduction. We assess the signal transduction quality in relation to 
the level of noise, and show it to be maximized by a non-zero level of noise, analogous 
to the stochastic resonance effect. The quality enhancement occurs for a finite range 
of stimuli to a single neuron; we show how to construct networks of neurons that ex- 
tend the range. The range increases more rapidly with network size when we make 
use of heterogeneous populations of neurons with a variety of thresholds, rather than 

iy~) . homogeneous populations of neurons all with the same threshold. The limited preci- 

sion of neural responses thus can have a direct effect on the optimal network structure, 

ly-s . with diverse functional properties of the constituent neurons supporting an economical 

c^*) ' information processing strategy that reduces the metabolic costs of handling a broad 

^D . class of stimuli. 

;H ■ 1 Introduction 

Neural network models are often constructed of simple units. Typically, model neurons 
have a particular threshold or bias, and saturate to a fixed value for either strong or weak 
inputs. Some such models can in fact be derived by systematic approximat ions of more 



J2 
a 



> 



detailed models such as the Hodgkin-Huxley model (lAbbott and Ke pler. 1990); many other 
models are derived from alternative heuristic or phenomenological assumptions. Networks 
of even the simplest models are well known to be capable of representing complex func- 
tions. 
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In this chapter, we investigate the degree to which the simple dynamics of an individual 
unit limit the inputs that can be processed, and how these limitations can be overcome. To 
do this, we consider the response of model neurons to a variety of stimuli. The precision 
of the responses for biological neurons have been shown to be rather limited, typically in 
the range of a few bits for each action potential or "spike" (IRieke et allll997T) . We mimic 
this limited precision in the model neurons by including noise in the systems. We focus 
on intrinsic neuronal noise which has identical statistics for each of the units in the neural 
network. 

Noise is usually viewed as limiting the sensitivity of a system, but nonlinear sys- 
tems can react to noise in surprising ways. Perhaps the best known of these is the phe- 
nomenon known as stochastic resonance (SR), wherein an optimal response to weak or 
subth reshold signals is observ ed when a non-zero level of noise is added to the sys- 
tem (|Gammaitoni et all 1 19981) . For example, a noise-free, subthreshold neuronal input 
can occasionally become suprathreshold when noise is added, allowing some charac- 
ter of the input signal to be detected. SR has been observed and investigated in many 
systems, ranging from resonant cavitie s to neural networks to the onset of ic e ages 
(see, e.g., Bezruko v and Vodvanovl.ll997uGailev et al.U 19971: iGovch uk and Hanggi, 2000; 



Jung and Shu ail. 1200 ll: LMoss and Pei , 



1995 



Schmid et al. 



2001 



Wenning and Obermayer . 



2 0031: Iwiesenfeld and MossL ll995h. 

Collins et all (| 1995b! ) showed, in a summing network of identical Fitzhugh-Nagumo 



model neurons, that an emergent property of SR in multi-component systems is that the 
enhancement of the response becomes independent of the power of the noise. This allows 
networks of elements with finite precision to take advantage of SR for diverse inputs. To 
build upon the findings of Collins et al, we consider networks of simpler model neurons, but 
these are allowed to ha ve differen t dynam ics. In particular, we examine noisy McCulloch- 
Pitts (McP) neurons (|Hertz et all Il99lh with a distribution of thresholds. We construct 
heterogeneous networks that perform better than a homogeneous network with the same 
number of noisy McP neurons and similar network architecture. 



2 Neural Network Model 



To investigate the effect of noise on signal transduction in networks of thresholding units, 
we consider a network of noisy McCulloch-Pitts (McP) neurons. The McP neuron is per- 
haps the simplest neural model, being only a simple thresholding unit. When the total 
input to a neuron (signal plus noise) exceeds its threshold, the neuron activates, firing an 
action potential or "spike." Formally, the activation state ai of neuron i in response to some 
stimulus S can be expressed as 

ai (S) = u(S-S ) , (1) 

where So is the neuron's threshold and u is the Heaviside step function, defined as 



u (x) 



1 x > 
otherwise 



(2) 
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We model the limited precision of neurons by including noise as an intrinsiqj feature of the 
neurons, so that eq. (O becomes 

at (5) = u(S-S + 7 1 ) , (3) 

where 77 is zero-mean, i.i.d. (independent, identically distributed) Gaussian noise with vari- 
ance a 2 . We assume the noise distributions to be identical for all the McP neurons. 

The network architecture is simple: an input layer of N noisy McP neurons is connected 
to a single linear output neuron. Each synaptic weight is of identical and unitary strength, so 
the output neuron calculates the sum of the N input neuron activation states as its response 
Rn(S): 

N 

R N (S) = J2ai(S) . (4) 

i=l 

Each input unit is presented the same analog signal, but with a different realization of the 
intrinsic neuronal noise. 

An important special case is when there is just a single input neuron (N = 1). Since 
the output neuron is a summing unit, its response is just the response of the single input 
neuron, i.e., Rjy(S) = a\(S). In this chapter, we will use "single neuron" synonymously 
with "network having only a single input neuron." 

3 Network Response 

In this section, we consider the response R^ of the network. Due to the specific choices 
of neural model and network architecture made in section [2 the resulting neural networks 
are quite tractable mathematically. A great deal of formal manipulation is thus possible, 
including exact calculations of the expectation value and variance of the network response. 

We initially focus on homogeneous networks with identical input neurons, including the 
special case of a single input neuron. The stimuli can be chosen without loss of generality 
so that So = 0. The results for Sq 7^ can be recovered by a straightforward translation 
along the S'-axis. 

The behavior for the standard, noise-free McP neurons is trivial, with all input neu- 
rons synchronously firing or remaining quiescent. However, considerably more interesting 
behavior is possible for noisy McP neurons: subthreshold signals have some chance of 
causing a neuron to fire, while suprathreshold signals have some chance of failing to cause 
the neuron to fire. 

For a network with N input neurons with the network architecture discussed above 
section 121 the response Rn of the output neuron is just the number of input neurons that 
fire. The probability p [S; a 2 ) of any neuron firing is 

/ \ 1 f 00 ( x 2 

p(S;a 2 ) = — / exp - — T I dx , (5) 



V2vrcr 2 J-s 



1 Although we conceptually take the noise as intrinsic to the neuron, the model we use is formally equivalent 
to a noise-free neuron subjected to a noisy stimulus. 
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while the probability q (5; a 2 ) for the neuron to remain quiescent is 

1 



(S;J) 



^h^ 1 



l ex A-^) dx 



(6) 



Combining eqs. ® and © gives q (S; a 2 ) + p (S; a 2 ) = 1, as expected. 

Given eqs. © and ©, the expectation value (R) (S; a 2 ) and variance a 2 z (S\ a 2 ) when 
N = lare 



(R)(S;a 2 ) = p(S;a 2 ) 



(7) 

a\ (S; a 2 ) p (S;a 2 ) q (S;a 2 ) . (8) 

Since the noise is independent, the probability of different input neurons firing is also in- 
dependent and the expected value and variance of the output neuron activation are seen to 
be 



*r n \ S '^' 



Np IS; a 

Np(S;a 2 )q(S;a 2 



(9) 
(10) 

The dependence of (R) (5; a 2 ) and a 2 R (S; a 2 ) on the stimulus S and the noise variance a 2 
is shown in fig. [Q 



4 Decoding the Network Output 

In this section, we explore the ability of the neural circuit to serve as a signal transducer. 
We identify limits on the signal transduction capability by decoding the state of the output 
neuron to reproduce the input stimulus. Near the threshold value So, this gives rise to linear 
decoding rules. The basic approach is similar to the "reverse reconstruction" using linear 
filtering that ha s been applied with grea t effect to the analys i s of a number of biological sys- 
tems (see. e.gjBialek and Riekelll992l : lBialek etailll99ll:|Prank et all boool : iRieke et al , 
19971 : iTheunissen et al.Lll996h . 

We expand the expected output R^ to first order near the threshold (i.e., S — > 0), giving 



WM-T + ^f 



s + o s 



(11) 



An example of the linear approximation is shown in fig. 12 

Dropping the higher order terms and inverting eq. (fTTT) gives a linear decoding rule of 

the form 

(Rn_ __ 1 



Sn 



2ira 2 



(12) 



V N 2, 

where §n is the estimate of the input stimulus. Combining eqs. (© and (fT2l . we can show 
that 



Sn) [S;cr 



*™»(,(^)-i) . 



(13) 
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The expected value of Sjy is thus seen to be independent of N; for notational simplicity, we 
drop the subscript and write \S). Examples of (S) are shown in fig. [3] for several values 

of the variance a 2 . Note that, as the noise variance increases, the expected value of the 
estimated stimulus closely matches the actual stimulus over a broader range. 

We must also consider the uncertainty of the value decoded from the network response. 
This leads to a total decoding error A5jv with the form 

AS 2 N (S;a 2 ) = ((s N - sf^ 

= e 2 (S;a 2 )+al N (S;a 2 ) , (14) 



where 



Sn 



(S;a 2 ) = (S)(S;a 2 )-S (15) 

(S; <J 2 ) = ((s N -(s)(S;a 2 )] 



27TCT 



-p(S;a 2 )q(S;a 2 ) . (16) 



N 
The expected difference e (S; a 2 ) and the decoding variance <r| (S; a 2 ) are shown in 

figs, a 

5 The Role of Noise 

The noisy nature of the neurons has a striking and counter-intuitive effect on the proper- 
ties of the activation state: increasing noise improves signal transmission, as seen in figs. [3] 
and | 4(a)| This effect is analogous to the stochastic resonance effect (iGammaitoni et all 



1998). SR can be informally understood as the noise sometimes driving a nominally sub- 
threshold signal to cross the threshold and producing a current. Signals close to the thresh- 
old will more frequently cross the threshold, giving a stronger response than signals far 
from the threshold. 

There are several properties of the activation probabilities that we can derive from 
eqs. (f5]) and © and that we will find useful for understanding the role of noise in the 
neural behavior. First, there is a scaling property with the form 

p(S;a 2 ^ =p(aS;aV) , (17) 

where a > 0. Second, there is a reflection property with the form 

p (5; a 2 ) =q (-S; a 2 ) . (18) 

As the activation probabilities are at the core of essentially all the equations in this chapter, 
the scaling and reflection properties will be broadly useful to us. 
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The scaling and reflection properties of the activation probabilities can be used to derive 
similar properties of the neural responses. The statistics of the neural responses obey the 
relations 

(R N )(S;a 2 ) = l-(R N )(-S;a 2 ) (19) 

(R N }(S;a 2 ^j = {R N )(aS;a 2 a 2 ^ (20) 

°R N ( S ^ 2 ) = °R N (- S ^ 2 ) (21) 

<J RN (S;a 2 ) = a 2 RN (aS-a 2 a 2 ) , (22) 

where a > 0. Important corollaries of these relations are that (Rn) (S; a 2 ) = 
(R N ) (-1; (a/V) 2 ) for all V < 0, (R N ) (S; a 2 ) = (R N ) (l; (a/V) 2 ) for all V > 0, and 
a\ (S; a 2 ) = a 2 R (1; (a/V) 2 ) for all V 7^ 0. It is thus necessary to consider only one 
subthreshold stimulus and one suprathreshold stimulus in order to understand the impact of 
noise on the neural responses; see fig. [5] 

Similarly, properties of the statistics for the estimated input S can be derived, giving 

(5) (-S; a 2 ) = -(s)(S;a 2 ) (23) 

s)(aS;aV) = a (s\ (S; a 2 ) (24) 

e (-5;cj 2 ) = -e(5;cj 2 ) (25) 

e(aS;a 2 a 2 ) = ae(S;a 2 ) (26) 



a R N 



°h 



(-S;a 2 ) = a 2 RN (S;a 2 ) (27) 

(a5;a 2 ci 2 ) = a 2 a\ N (S\ d 2 ) , (28) 

where a > 0. Eqs. dH through (ggj imply that is) (S; a 2 ) = S (s\ (1; {a/S) 2 ) and 
e(S , ;<r 2 ) = Se (1; (a/S) 2 ) for all 5^0. Again, we can focus on one subthreshold 
stimulus and one suprathreshold stimulus to understand the impact of noise (see fig. [6]) for 
the behavior in the two cases, and use straightforward transformation to obtain the exact 
results for other stimuli. 

Further, eq. ® and eqs. <[25) through dHJ) imply e 2 (S;a 2 ) = S 2 e 2 (1; (a/S) 2 ), 

°r n ( 5 ; ^ 2 ) = &°r n (!; (^/^) 2 )> and A ^ ( 5 ; ^ 2 ) = 5 ' A ^ C 1 ; (°/ s )) for a11 5 ^ °- 

Thus, the noise dependence of these latter error sources can be understood with a single 
stimulus; see fig. |7] Note that the total error ASJ^ (l; (a/S) 2 ) has its minimum for a 
nonzero value of the noise variance, analogous to the stochastic resonance effect; see fig.|7] 

6 Networks of Heterogeneous Neurons 

Thus far, we have focused on single neurons and networks of identical neurons. The effect 
of multiple neurons has generally been simple, either having no effect on — \S), e — or just 
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rescaling — (Rn), °"r„> a 1 — tne single-channel values. 

A significant exception to this general trend is found in ASJ^. In fig. [U we show how 
ASjy varies with N. The error curve flattens out into a broad range of similar values, so 
that the presence of noise enhances signal transduction without requiring a precise relation 
between S and a 2 seen for smaller values of N. This effect is essentially the "stochastic 
resonance without tuning" first reported by ICollins et al.l (|1995ah . 



Informally stated, SR without tuning allows for a wider range of potentials to be accu- 
rately decoded from the channel states for any particular value of the noise variance. To 
make this notion of "wider range" precise, we again focus our attention on the expected 
response of the neurons (see fig. O. The expected neural response (Rn) saturates to zero 
or one when S is far from the neuronal threshold. The width W of the intermediate range 
can be defined, for example, by taking the boundaries of this range to be the points where 
the first order approximation reaches the saturation values of zero and one. The width in 
this case becomes W = yj2iro 2 . 

Other definitions for the response width are, of course, possible, but we still should 
observe that the width is proportional to a, since the activation probability depends only on 
the ratio of S and a (eq. ©). The same width is found for multiple identical input neurons, 
because the output neuron response is proportional to the single neurons response, without 
broadening the curve in fig. [2] 

The response width can thus be increased by increasing the noise variance a 2 . As seen 

?2 



in figs. [7] and [8j such an increase ultimately leads to a growth in the decoding error ASjj. 
In the asymptotic limit as a 2 becomes large, ASf^ is dominated by <r| and we have the 
asymptotic behavior 

AS 2 N (S;a 2 )=o(^\ , (29) 

based on eq. (fTBT ). The growth in AS 2 ^ with increasing a 2 thus can be overcome by further 
increasing the number of neurons in the input layer. Therefore, the response width W is 
effectively constrained by the number of neurons N, with W = 0(yN) for large N. 

An arbitrary response width can be produced by assembling enough neurons. How- 
ever, this approach is inefficient, and greater width increases can be achieved with the same 
number of neurons. Consider instead dividing up the total width into M subranges. These 
subranges can each be independently covered by a subpopulation of N neurons; all neurons 
within a subpopulation are identical to one another, while neurons from different subpop- 
ulations differ only in their thresholds. The width of each subrange is 0(y/N), but the 
total width is O(MVjV). Thus, the total response width can increase more rapidly as ad- 
ditional subpopulations of neurons are added. Conceptually, multiple thresholds are a way 
to provide a wide range of accurate responses, with multiple neurons in each subpopulation 
providing independence from any need to "tune" the noise variance to a particular value. 

To describe the behavior of channels with different thresholds, much of the preceding 
analysis can be directly applied by translating the functions along the potential axis to obtain 
the desired threshold. However, system behavior was previously explored near the threshold 
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value, but heterogeneous populations of neurons have multiple thresholds. Nonetheless, we 
can produce a comparable system by simply assessing system behavior near the center of 
the total response width. 

To facilitate a clean comparison, we set the thresholds in the heterogeneous populations 
so that a linear decoding rule can be readily produced. A simple approach that achieves this 
is to space the thresholds of the subpopulations by 2W, with all neurons being otherwise 
equal. The subpopulations with lower thresholds provide an upward shift in the expected 
number of active neurons for higher threshold subpopulations, such that the different sub- 
populations are all approximated to first order by the same line. Thus, the expected total 
number of active neurons leads to a linear decoding rule by expanding to first order and 
inverting, as was done earlier for homogeneous populations. Note that this construction 
requires no additional assumptions about how the neural responses are to be interpreted, 
nor does it require alterations to the network architecture. 

To illustrate the effect of multiple thresholds, we begin by investigating the response 
of a homogeneous baseline to a stimulus S. The baseline network consists of M = 1 
populations of N = 1000 neurons with Sq = and variance a 2 = 1. Using the definition 
above, the response width is W = v2tt. We then consider two cases, homogeneous and 
heterogeneous, in each of which we increase the response width by doubling the number of 
neurons while maintaining similar error expectations for the decoded stimuli. 

In the homogeneous case, we have a single population (M = 1) with N = 2000 
neurons. Doubling the number of neurons allows us to double the variance to a 2 = 2 with 
similar expected errors outside the response width. Thus, we observe an extended range, 
relative to the baseline case, in which we can reconstruct the stimulus from the network 
output (fig. |U). 

In the heterogeneous case, we instead construct two subpopulations (M = 2) with 
N = 1000 neurons. We leave the variance unchanged at a 2 = 1. One of the subpopulations 
is modified so that the thresholds lie at +W/2 = ^Jir/2, while the other is modified so that 
the thresholds lie at —W/2 = —^J-kJ2. The resulting neural network has a broad range in 
which we can reconstruct the stimulus from the network response, markedly greater than 
the baseline and homogeneous cases (fig.|9]). 

7 Conclusion 

We have constructed networks of heterogeneous McP neurons that outperform similar net- 
works of homogeneous McP neurons. The network architectures are identical, with the only 
difference being the distribution of neuronal thresholds. The heterogeneous networks are 
sensitive to a wider range of signals than the homogeneous networks. Such networks are 
easily implemented, and could serve as simple models of many diverse natural and artificial 
systems. 

The superior scaling properties of heterogeneous neuronal networks can have a pro- 
found metabolic impact; large numbers of neurons imply a large energetic investment, in 
terms of both cellular maintenance and neural activity. The action potentials generated in 
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neurons can require a significant energetic cost (ILaughlin et all Il998l) . making the trade- 



off between reliably coding information and the metabolic costs potentially quite important. 
Thus, we expect neuronal heterogeneity to be evolutionarily favored, even for quite simple 
neural circuits. 

Although we have used a specific model consisting of thresholding neurons with addi- 
tive Gaussian noise, we expect that the key result is more widely applicable. The demon- 
stration of the advantage of neuronal heterogeneity largely follows from two factors that 
are not specific to the model neurons. First, the distance of the input stimulus from the 
threshold is proportional to the standard deviation of the Gaussian noise, and, second, the 
total variance of the network response is proportional to the number of input neurons. Ul- 
timately, the heterogeneous thresholds are favorable because the independently distributed 
noise provides a natural scale for the system. 
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Figure 1: Statistics of single neuron activation. As the noise variance a 2 increases, (a) the 
mean activation state (R) (£; a 2 ) takes longer to saturate to the extreme values, while (b) 
the variance a 2 R (S; a 2 ) of the activation state increases with the noise variance. 
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Figure 2: First order approximation of the expected activation of a single neuron. Near the 
threshold (Sq = 0), the expected activation is nearly linear. Further from the threshold, the 
activation saturates at either zero or one and diverges from the linear approximation. The 
values shown here are based on noise variance a 2 = 1. 



<Cq 




Figure 3: Expectation value of the stimulus decoded from the output of a single neuron. As 
the noise variance a 2 increases, the expectation value of the decoded stimulus approximates 
the true value of the stimulus over an interval of increasing width. 
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Figure 4: (a) Expected difference between the stimulus and the value decoded from the 
single-neuron response. The decoded value systematically diverges from the true value as 
the input gets farther from the threshold value at zero, (b) Variance of the decoded stimulus 
values. Again, the variances shown here are based on decoding the single-neuron response. 
The variance of the neuronal noise has been used to scale the variances of the estimates into 
a uniform range. 
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Figure 5: (a) Noise dependence of single-neuron mean activation. For large values of 
(a/S) 2 , the expected activation state asymptotically approaches 1/2. (b) Noise dependence 
of single-neuron activation variance. For large values of (a/V) 2 , the variance asymptoti- 
cally approaches 1/4. 
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(a) Estimated stimulus 
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(b) Expected difference 

Figure 6: (a) Noise dependence of the single-neurons estimated stimulus. For large values 
of (a/S) 2 , the estimates for the subthreshold and suprathreshold signals asymptotically 
approach —1 and +1, respectively, (b) Noise dependence of the expected difference. For 
large values of (a/S) 2 , the expected differences asymptotically approach for both the 
subthreshold and suprathreshold signals. 
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Figure 7: Comparison of decoding error sources. The values shown here are calculated 
from the response of a single neuron. The minimum in AS 2 occurs for a nonzero noise 
variance of the signal, as with stochastic resonance. 
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Figure 8: Effect of the number of neurons on the decoding error. As N becomes large, 
the error curve flattens out, indicating a broad range of noise values that all give similar 
accuracy in the decoding process. 
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Figure 9: (a) Expectation value of the decoded output in homogeneous and heterogeneous 
networks. The response of the heterogeneous neural network (M = 2, N = 1000) can be 
accurately decoded over a broader range than the responses of the baseline (M = 1, N = 
1000) and homogeneous (M = 1, N = 2000) networks, (b) Total decoding error for ho- 
mogeneous and heterogeneous networks. The heterogeneous neural network has a broader 
basin of low error values than the baseline and homogeneous networks. 



